CHAPTER 18 A Yes-or-No Proposition: Logistic Regression 267
Don’t misinterpret odds ratios
for numerical predictors
The OR always represents the factor by which the odds of getting the outcome
event increases when the predictor increases by exactly one unit of measure,
whatever that unit may be. Sometimes you may want to express the OR in
more convenient units than what the data was recorded in. For the example in
Table 18-1, the OR for dose as a predictor of death is 1.0115 per REM. This isn’t too
meaningful because one REM is a very small increment of radiation. By raising
1.0115 to the 100th power, you get the equivalent OR of 3.1375 per 100 REMs, and
you can express this as, “Every additional 100 REMs of radiation more than triples
the odds of dying.”
The value of a regression coefficient depends on the units in which the corre-
sponding predictor variable is expressed. So the coefficient of a height variable
expressed in meters is 100 times larger than the coefficient of height expressed in
centimeters. In logistic regression, ORs are obtained by exponentiating the coef-
ficients, so switching from centimeters to meters corresponds to raising the OR
(and its confidence limits) to the 100th power.
Beware of the complete separation
problem
Imagine your logistic regression model perfectly predicted the outcome, in that
every individual positive for the outcome had a predicted probability of 1.0, and
every individual negative for the outcome had a 0 predicted probability. This is
called perfect separation or complete separation, and the problem is called the perfect
predictor problem. This is a nasty and surprisingly frequent problem that’s unique
to logistic regression, which highlights the sad fact that a logistic regression
model will fail to converge in the software if the model fits perfectly!
If the predictor variable or variables in your model completely separate the yes
outcomes from the no outcomes, the maximum likelihood method will try to make
the coefficient of that variable infinite, which usually causes an error in the soft-
ware. If the coefficient is positive, the OR tries to be infinity, and if it is negative,
it tries to be 0. The SE of the OR tries to be infinite, too. This may cause your CI to
have a lower limit of 0, an upper limit of infinity, or both.
Check out Figure 18-8, which visually describes the problem. The regression is
trying to make the curve come as close as possible to all the data points. Usually it
has to strike a compromise, because there’s a mixture of 1s and 0s, especially in
the middle of the data. But with perfectly separated data, no compromise is neces-
sary. As b becomes infinitely large, the logistic function morphs into a step func-
tion that touches all the data points (observe where b = 5).